5.306 pbsnodes with NUMA-Awareness

 

When Torque is configured with NUMA-awareness and configured with --enable-cgroups, the number of total and the number of available sockets, numachips (numa nodes), cores, and threads are returned when the status of nodes are queried by Moab (a call is made to pbsnodes).

The example output that follows shows a node with two sockets, four numachips, 16 cores and 32 threads. In this example, no jobs are currently running on this node; therefore, the available resources are the same as the total resources.

torque-devtest-01
     state = free
     power_state = Running
     np = 16
     ntype = cluster
     status =
rectime=1412732948,macaddr=00:26:6c:f4:66:a0,cpuclock=Fixed,varattr=,jobs=,state=free,netload=17080856592,gres=,loadave=10.74,ncpus=16,physmem=49416100kb,availmem=50056608kb,totmem=51480476kb,idletime=29,nusers=2,nsessions=3,sessions=8665
8671 1994,uname=Linux torque-devtest-01 2.6.32-358.el6.x86_64 #1 SMP
Fri Feb 22 00:31:26 UTC 2013 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numachips = 4
     total_cores = 16
     total_threads = 32
     available_sockets = 2
     available_numachips = 4
     available_cores = 16
     available_threads = 32

However, if a job requesting only a single core was started on this node, the pbsnodes output will look like:

torque-devtest-01
     state = free
     power_state = Running
     np = 16
     ntype = cluster
     jobs = 0/112.torque-devtest-01
     status =
rectime=1412732948,macaddr=00:26:6c:f4:66:a0,cpuclock=Fixed,varattr=,jobs=,state=free,netload=17080856592,gres=,loadave=10.74,ncpus=16,physmem=49416100kb,availmem=50056608kb,totmem=51480476kb,idletime=29,nusers=2,nsessions=3,sessions=8665
8671 1994,uname=Linux torque-devtest-01 2.6.32-358.el6.x86_64 #1 SMP
Fri Feb 22 00:31:26 UTC 2013 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numachips = 4
     total_cores = 16
     total_threads = 32
     available_sockets = 1
     available_numachips = 3
     available_cores = 15
     available_threads = 30

In looking at the output for this example, you will see that even though only one core was requested the available sockets, numachip, cores and threads were all reduced. This is because the NUMA architecture is hierarchical: socket contains one or more numachips; a numachip contains two or more cores; cores contain one or more threads (one thread in the case of non-threaded cores). In order for a resource to be available, the entire resource must be free. When a job uses one core, the use of that core consumes part of the associated socket, and numa chip resources. As a result, the affected socket and numachip cannot be used when subsequent jobs request sockets and numachips as resources. Also, because the job asked for one core, the number of threads for that core are consumed. As a result, the number of threads available on the machine is reduced by the number of threads in the core.

As another example, suppose a user makes an job request and they want to use a socket. The pbsnodes output will look like:

torque-devtest-01
     state = free
     power_state = Running
     np = 16
     ntype = cluster
     jobs = 113.torque-devtest-01
     status =
rectime=1412732948,macaddr=00:26:6c:f4:66:a0,cpuclock=Fixed,varattr=,jobs=,state=free,netload=17080856592,gres=,loadave=10.74,ncpus=16,physmem=49416100kb,availmem=50056608kb,totmem=51480476kb,idletime=29,nusers=2,nsessions=3,sessions=8665
8671 1994,uname=Linux torque-devtest-01 2.6.32-358.el6.x86_64 #1 SMP
Fri Feb 22 00:31:26 UTC 2013 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numachips = 4
     total_cores = 16
     total_threads = 32
     available_sockets = 1
     available_numa_chips = 2
     available_cores = 8
     available_threads = 16

In looking at the output in this example, you will see that not only are the available sockets reduced to one, but all of the numachips, cores, and threads associated with the socket are no longer available. In other words, by requesting a job placement of "socket" all of the resources of the socket are reserved and are no longer available to other jobs.

© 2017 Adaptive Computing